The R Primer by Ekstrøm Claus Thorn
Author:Ekstrøm, Claus Thorn [Ekstrøm, Claus Thorn]
Language: eng
Format: epub, pdf
Published: 2011-07-11T21:07:53+00:00
9.445
< 2e-16 ***
pcPC2
0.1592
0.5050
0.315 0.752540
pcPC3
-0.7191
0.3273
-2.197 0.028032 *
pcPC4
0.9151
0.3691
2.479 0.013159 *
---
Signif. codes:
0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 884.35
on 682
degrees of freedom
Residual deviance: 106.12
on 678
degrees of freedom
AIC: 116.12
Number of Fisher Scoring iterations: 8
We see from the logistic regression analysis that the first principal com-
ponent is highly significant while principal components 3 and 4 are
barely significant. Principal component 2 is not significant when the
three other principal components are part of the model, so even though
principal component 2 explains second-most of the variation (of the
variables) it has no effect on the responses.
See also: See Rule 3.31 for principal component analysis and Rule 3.8 for
logistic regression modelling. The pcr function from the pls package
can also be used for principal component regression.
160
The R Primer
3.33
Classify observations using linear discriminant
analysis
Problem: You want to find a linear combination of features which can
be used for classification of two or more classes.
Solution:
Discriminant analysis is a statistical technique for classifi-
cation of data into mutually exclusive groups. In linear discriminant
analysis we assume that the groups can be separated by a linear combi-
nation of features that describe the objects and with k groups we need
k − 1 discriminators to separate the classes.
The function lda from the MASS package can be used for linear dis-
crimination analysis. Input for the lda function is a model formula
of the form group ~ x1 + x2 + · · · where the response group is
a grouping factor and x1, x2, . . . are quantitative discriminators. The
prior option can be set to give the prior probabilities of class mem-
bership. If it is unspecified, the probabilities of class membership are
estimated from the dataset.
In the following code, we will make a model to classify breast cancer
type (“benign” or “malignant”) based on tumor clump thickness (V1),
uniformity of cell size (V2) and uniformity of cell shape (V3). The vari-
ables are found in the biopsy data frame found in the MASS package.
> library(MASS)
> data(biopsy)
> fit <- lda(class ~ V1 + V2 + V3, data=biopsy)
> fit
Call:
lda(class ~ V1 + V2 + V3, data = biopsy)
Prior probabilities of groups:
benign malignant
0.6552217 0.3447783
Group means:
V1
V2
V3
benign
2.956332 1.325328 1.443231
malignant 7.195021 6.572614 6.560166
Coefficients of linear discriminants:
LD1
V1 0.2321486
V2 0.2574805
V3 0.2500765
> plot(fit, col="lightgray")
The prior probability of the groups and the resulting linear discrimina-
Statistical analyses
161
1.2
0.6
0.0
−2
−1
0
1
2
3
4
group benign
1.2
0.6
0.0
−2
−1
0
1
2
3
4
group malignant
Figure 3.14: Example of lda output. Histograms for values of the linear dis-
criminator is shown for observations from both the “benign” and “malignant”
group.
tor are both seen in the output. Plotting the fitted model can be seen
in Figure 3.14 and the type of plot produced depends on the number
of discriminators. If there is only one discriminator or if the argument
dimen=1 is set, then a histogram is plotted; if there are two or more
discriminators, then a pairs plot is shown.
The predict function provides a list of predicted classes (the class
component) and posterior probabilities (the posterior component)
when the result from lda is supplied as input. These can be used to
evaluate the sensitivity and specificity of the classification.
> result <- table(biopsy$class, predict(fit)$class)
> result
benign malignant
benign
448
10
malignant
33
208
> sum(diag(result)) / sum(result)
[1] 0.9384835
We can see here that the linear discriminant analysis correctly classi-
fies 93.
Download
This site does not store any files on its server. We only index and link to content provided by other sites. Please contact the content providers to delete copyright contents if any and email us, we'll remove relevant links or contents immediately.
Modelling of Convective Heat and Mass Transfer in Rotating Flows by Igor V. Shevchuk(6412)
Weapons of Math Destruction by Cathy O'Neil(6229)
Factfulness: Ten Reasons We're Wrong About the World – and Why Things Are Better Than You Think by Hans Rosling(4719)
A Mind For Numbers: How to Excel at Math and Science (Even If You Flunked Algebra) by Barbara Oakley(3269)
Descartes' Error by Antonio Damasio(3253)
Factfulness_Ten Reasons We're Wrong About the World_and Why Things Are Better Than You Think by Hans Rosling(3218)
TCP IP by Todd Lammle(3163)
Fooled by Randomness: The Hidden Role of Chance in Life and in the Markets by Nassim Nicholas Taleb(3084)
Applied Predictive Modeling by Max Kuhn & Kjell Johnson(3047)
The Tyranny of Metrics by Jerry Z. Muller(3037)
The Book of Numbers by Peter Bentley(2945)
The Great Unknown by Marcus du Sautoy(2669)
Once Upon an Algorithm by Martin Erwig(2631)
Easy Algebra Step-by-Step by Sandra Luna McCune(2608)
Lady Luck by Kristen Ashley(2561)
Police Exams Prep 2018-2019 by Kaplan Test Prep(2521)
Practical Guide To Principal Component Methods in R (Multivariate Analysis Book 2) by Alboukadel Kassambara(2520)
All Things Reconsidered by Bill Thompson III(2375)
Linear Time-Invariant Systems, Behaviors and Modules by Ulrich Oberst & Martin Scheicher & Ingrid Scheicher(2350)